GPU-based Parallel Hybrid Genetic Algorithms
نویسندگان
چکیده
Over the last years, interest in hybrid metaheuristics has risen considerably in the field of optimization. Combinations of algorithms such as genetic algorithms (GAs) and local search (LS) methods have provided very powerful search algorithms. However, due to their complexity, the computational time of the solution search exploration remains exorbitant when large problem instances are to be solved. Therefore, the use of GPU-based parallel computing is required as a complementary way to speed up the search. This paper presents a new methodology to design and implement efficiently and effectively hybrid genetic algorithms on GPU accelerators. I. SCHEME OF PARALLELIZATION The adaptation of hybrid GAs on GPU requires to take into account at the same time the characteristics and underlined issues of the GPU architecture and the metaheuristics parallel models. Since the evaluation of the neighborhood is generally the time-consuming part of hybrid GAs, we focus on the re-design of LS algorithms on GPU (see Fig. 1). We propose a three-level decomposition of the GPU adapted to the popular parallel iteration-level model [1] (generation and evaluation of the neighborhood in parallel) allowing a clear separation of the GPU memory hierarchical management concepts (see Fig. 2). In the high-level layer, the CPU sends the number of expected running threads to the GPU, then candidate neighbors are generated and evaluated on GPU (at intermediate-level and low-level), and finally newly evaluated solutions are returned back to the host. This model can be seen as a cooperative model between the CPU and the GPU. Indeed, the GPU is used as a coprocessor in a synchronous manner. The resource-consuming part i.e. the incremental evaluation kernel is calculated by the GPU and the rest is handled by the CPU. The intermediate-level layer focuses on the generation of the LS neighborhood on GPU. This generation is performed in a dynamic manner which implies that no explicit structure needs to be allocated or copied (unlike traditional GAs on GPU [2]). Thereby, only the representation of this candidate solution must be copied from the CPU to the GPU. Therefore, the main difficulty of the intermediate-level layer is to find an efficient mapping between a GPU thread and a LS neighbor candidate solution. In other words, the issue is to say which solution must be handled by which thread. The answer is dependent of the solution representation. Afterwards, GPU memory management of the evaluation function computation is done at low-level. The use of texture memory is a solution for reducing memory transactions due to non-coalesced accesses (matrices, solution which generates the neighborhood). Indeed, texture memory can be seen as a relaxed mechanism for the thread processors to access global memory because the coalescing requirements do not apply to texture memory accesses. II. EXPERIMENTATION
منابع مشابه
Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملHybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage
In this paper, a Job shop scheduling problem with a parallel assembly stage and Lot Streaming (LS) is considered for the first time in both machining and assembly stages. Lot Streaming technique is a process of splitting jobs into smaller sub-jobs such that successive operations can be overlapped. Hence, to solve job shop scheduling problem with a parallel assembly stage and lot streaming, deci...
متن کاملModeling and scheduling no-idle hybrid flow shop problems
Although several papers have studied no-idle scheduling problems, they all focus on flow shops, assuming one processor at each working stage. But, companies commonly extend to hybrid flow shops by duplicating machines in parallel in stages. This paper considers the problem of scheduling no-idle hybrid flow shops. A mixed integer linear programming model is first developed to mathematically form...
متن کاملMassively parallel motion planning algorithms under uncertainty using POMDP
We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient ...
متن کاملParallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform
There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...
متن کامل